Non-uniform Instruction Scheduling
نویسندگان
چکیده
Dynamic instruction scheduling logic is one of the most critical and cycle-limiting structures in modern superscalar processors, and it is not easily pipelined without significant losses in performance. However, these performance losses are incurred only due to a small fraction of instructions, which are intolerant to the non-atomic scheduling. We first perform an empirical analysis of the instruction streams to determine which instructions actually require single cycle scheduling. We then propose a Non-Uniform Scheduler – a design that partitions the scheduling logic into two queues, each with dedicated wakeup and selection logic: a small Fast Issue Queue (FIQ) to issue critical instructions in the back-to-back cycles and a large Slow Issue Queue (SIQ) to issue the remaining instructions over two cycles with a one cycle bubble between dependent instructions. Finally, we propose and evaluate several steering mechanisms to effectively distribute instructions between the queues.
منابع مشابه
Guaranteeing Forward Progress of Unified Register Allocation and Instruction Scheduling
Increasingly demanding computation requirements and tighter energy constraints have motivated distributed and/or hierarchical register file (dhrf) organizations as a mean to efficiently sustain a sufficient alu utilization in processors targeting embedded applications with many alus. Compared to conventional centralized register file organizations, dhrfs lead to tighter coupling between registe...
متن کاملUser-level scheduling on NUMA multicore systems under Linux
The problem of scheduling on multicore systems remains one of the hottest and the most challenging topics in systems research. Introduction of non-uniform memory access (NUMA) multicore architectures further complicates this problem, as on NUMA systems the scheduler needs not only consider the placement of threads on cores, but also the placement of memory. Hardware performance counters and har...
متن کاملAligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors
The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...
متن کاملAn effective and efficient code generation algorithm for uniform loops on non-orthogonal DSP architecture
To meet ever-increasing demands for higher performance and lower power consumption, many high-end digital signal processors (DSPs) commonly employ non-orthogonal architecture. This architecture typically is characterized by irregular data paths, heterogeneous registers, and multiple memory banks. Moreover, sufficient compiler support is obviously important to harvest its benefits. However, usua...
متن کاملRegister allocation sensitive region scheduling
Because of the interdependences between instruction scheduling and register allocation, it is not clear which of these two phases should run rst. In this paper, we describe how we modiied a global instruction scheduling technique to make it cooperate with a subsequent register allocation phase. In particular, our cooperative global instruction scheduler performs region scheduling transformation...
متن کامل